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Abstract 

This  work  presents  a  distributed  method  for  control  centers  to  monitor  the  operating  condition  of  a  power  network,  i.e.,  to 
estimate  the  network  state,  and  to  ultimately  determine  the  occurrence  of  threatening  situations.  State  estimation  has  been 
recognized  to  be  a  fundamental  task  for  network  control  centers  to  ensure  correct  and  safe  functionalities  of  power  grids. 
We  consider  (static)  state  estimation  problems,  in  which  the  state  vector  consists  of  the  voltage  magnitude  and  angle  at 
all  network  buses.  We  consider  the  state  to  be  linearly  related  to  network  measurements,  which  include  power  flows,  current 
injections,  and  voltages  phasors  at  some  buses.  We  admit  the  presence  of  several  cooperating  control  centers,  and  we  design  two 
distributed  methods  for  them  to  compute  the  minimum  variance  estimate  of  the  state  given  the  network  measurements.  The 
two  distributed  methods  rely  on  different  modes  of  cooperation  among  control  centers:  in  the  first  method  an  incremental  mode 
of  cooperation  is  used,  whereas,  in  the  second  method,  a  diffusive  interaction  is  implemented.  Our  procedures,  which  require 
each  control  center  to  know  only  the  measurements  and  structure  of  a  subpart  of  the  whole  network,  are  computationally 
efficient  and  scalable  with  respect  to  the  network  dimension,  provided  that  the  number  of  control  centers  also  increases  with 
the  network  cardinality.  Additionally,  a  finite-memory  approximation  of  our  diffusive  algorithm  is  proposed,  and  its  accuracy 
is  characterized.  Finally,  our  estimation  methods  are  exploited  to  develop  a  distributed  algorithm  to  detect  corrupted  data 
among  the  network  measurements. 


1  Introduction 

Large-scale  complex  systems,  such  as,  for  instance,  the  electrical  power  grid  and  the  telecommunication  system, 
are  receiving  increasing  attention  from  researchers  in  different  fields.  The  wide  spatial  distribution  and  the  high 
dimensionality  of  these  systems  preclude  the  use  of  centralized  solutions  to  tackle  classical  estimation,  control, 
and  fault  detection  problems,  and  they  require,  instead,  the  development  of  new  decentralized  techniques.  One 
possibility  to  overcome  these  issues  is  to  geographically  deploy  some  monitors  in  the  network,  each  one  responsible 
for  a  different  subpart  of  the  whole  system.  Local  estimation  and  control  schemes  can  successively  be  used,  together 
with  an  information  exchange  mechanism  to  recover  the  performance  of  a  centralized  scheme. 

1.1  Control  centers,  state  estimation  and  cyber  security  in  power  networks 

Power  systems  are  operated  by  system  operators  from  the  area  control  center.  The  main  goal  of  the  system  operator 
is  to  maintain  the  network  in  a  secure  operating  condition,  in  which  all  the  loads  are  supplied  power  by  the  generators 
without  violating  the  operational  limits  on  the  transmission  lines.  In  order  to  accomplish  this  goal,  at  a  given  point 
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Fig.  1.  In  Fig.  1(a),  remote  terminal  units  (RTU)  transmit  their  measurements  to  a  SCADA  terminal  via  a  LAN  network. 
The  data  is  then  sent  to  a  Control  Center  to  implement  network  estimation,  control,  and  optimization  procedures.  Fig.  1(b) 
shows  the  diagram  of  the  IEEE  118  bus  system  (courtesy  of  the  IIT  Power  Group).  The  network  has  118  buses,  186  branches, 
99  loads,  and  54  generators. 

in  time,  the  network  model  and  the  phasor  voltages  at  every  system  bus  need  to  be  determined,  and  preventive 
actions  have  to  be  taken  if  the  system  is  found  in  an  insecure  state.  For  the  determination  of  the  operating  state, 
remote  terminal  units  and  measuring  devices  are  deployed  in  the  network  to  gather  measurements.  These  devices 
are  then  connected  via  a  local  area  network  to  a  SCADA  (Supervisory  Control  and  Data  Acquisition)  terminal, 
which  supports  the  communication  of  the  collected  measurements  to  a  control  center.  At  the  control  center,  the 
measurement  data  is  used  for  control  and  optimization  functions,  such  as  contingency  analysis,  automatic  generation 
control,  load  forecasting,  optimal  power  flow  computation,  and  reactive  power  dispatch  [1],  A  diagram  representing 
the  interconnections  between  remote  terminal  units  and  the  control  center  is  reported  in  Fig.  1(a).  Various  sources 
of  uncertainties,  e.g.,  measurement  and  communication  noise,  lead  to  inaccuracies  in  the  received  data,  which  may 
affect  the  performance  of  the  control  and  optimization  algorithms,  and,  ultimately,  the  stability  of  the  power  plant. 
This  concern  was  first  recognized  and  addressed  in  [27,  28,  29]  by  introducing  the  idea  of  (static)  state  estimation 
in  power  systems. 

Power  network  state  estimators  are  broadly  used  to  obtain  an  optimal  estimate  from  redundant  noisy  measure¬ 
ments,  and  to  estimate  the  state  of  a  network  branch  which,  for  economical  or  computational  reasons,  is  not  directly 
monitored.  For  the  power  system  state  estimation  problem,  several  centralized  and  parallel  solutions  have  been  de¬ 
veloped  in  the  last  decades,  e.g.,  see  [19,  8,  30].  Being  an  online  function,  computational  issues,  storage  requirements, 
and  numerical  robustness  of  the  solution  algorithm  need  to  be  taken  into  account.  Within  this  regard,  distributed 
algorithms  based  on  network  partitioning  techniques  are  to  be  preferred  over  centralized  ones.  Moreover,  even  in 
decentralized  setting,  the  work  in  [20]  on  the  blackout  of  August  2003  suggests  that  an  estimation  of  the  entire 
network  is  essential  to  prevent  networks  damages.  In  other  words,  the  whole  state  vector  should  be  estimated  by 
and  available  to  every  unit.  The  references  [34,  12]  explore  the  idea  of  using  a  global  control  center  to  coordinate 
estimates  obtained  locally  by  several  local  control  centers.  In  this  work,  we  improve  upon  these  prior  results  by 
proposing  a  fully  decentralized  and  distributed  estimation  algorithm,  which,  by  only  assuming  local  knowledge  of 
the  network  structure  by  the  local  control  centers,  allows  them  to  obtain  in  finite  time  an  optimal  estimate  of  the 
network  state.  Being  the  computation  distributed  among  the  control  centers,  our  procedure  appears  scalable  against 
the  power  network  dimension,  and,  furthermore,  numerically  reliable  and  accurate. 

A  second  focus  of  this  paper  is  false  data  detection  and  cyber  attacks  in  power  systems.  Because  of  the  increasing 
reliance  of  modern  power  systems  on  communication  networks,  the  possibility  of  cyber  attacks  is  a  real  threat 
[18].  One  possibility  for  the  attacker  is  to  corrupt  the  data  coming  from  the  measuring  units  and  directed  to  the 
control  center,  in  order  to  introduce  arbitrary  errors  in  the  estimated  state,  and,  consequently,  to  compromise  the 
performance  of  control  and  optimization  algorithms  [14].  This  important  type  of  attack  is  often  referred  in  the 
power  systems  literature  to  as  false  data  injection  attack.  Recently,  the  authors  of  [33]  show  that  a  false  data 
injection  attack,  in  addition  to  destabilizing  the  grid,  may  also  lead  to  fluctuations  in  the  electricity  market,  causing 
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significant  economical  losses.  The  presence  of  false  data  is  classically  checked  by  analyzing  the  statistical  properties 
of  the  estimation  residual  z  —  Hx,  where  z  is  the  measurements  vector,  a;  is  a  state  estimate,  and  H  is  the  state  to 
measurements  matrix.  For  an  attack  to  be  successful,  the  residual  needs  to  remain  within  a  certain  confidence  level. 
Accordingly,  one  approach  to  circumvent  false  data  injection  attacks  is  to  increase  the  number  of  measurements  so 
as  to  obtain  a  more  accurate  confidence  bound.  Clearly,  by  increasing  the  number  of  measurements,  the  data  to  be 
transmitted  to  the  control  center  increases,  and  the  dimension  of  the  estimation  problem  grows.  By  means  of  our 
estimation  method,  we  address  this  dimensionality  problem  by  distributing  the  false  data  detection  problem  among 
several  control  centers. 

1.2  Related  work  on  distributed  estimation  and  projection  methods 

Starting  from  the  eighties,  the  problem  of  distributed  estimation  has  attracted  intense  attention  from  the  scientific 
community,  generating  along  the  years  a  very  rich  literature.  More  recently,  because  of  the  advent  of  highly  integrated 
and  low-cost  wireless  devices  as  key  components  of  large  autonomous  networks,  the  interest  for  this  classical  topic  has 
been  renewed.  For  a  wireless  sensor  network,  novel  applications  requiring  efficient  distributed  estimation  procedures 
include,  for  instance,  environment  monitoring,  surveillance,  localization,  and  target  tracking.  Considerable  effort  has 
been  devoted  to  the  development  of  distributed  and  adaptive  filtering  schemes,  which  generalize  the  notion  of  adaptive 
estimation  to  a  setup  involving  networked  sensing  and  processing  devices  [4].  In  this  context,  relevant  methods  include 
incremental  Least  Mean-Square  [15],  incremental  Recursive  Least-Square  [24],  Diffusive  Least  Mean-Square  [24],  and 
Diffusive  Recursive  Least-Square  [4].  Diffusion  Kalman  filtering  and  smoothing  algorithms  are  proposed,  for  instance, 
in  [3,  5],  and  consensus  based  techniques  in  [25,  26].  We  remark  that  the  strategies  proposed  in  the  aforementioned 
references  could  be  adapted  for  the  solution  of  the  power  network  static  estimation  problem.  Their  assumptions, 
however,  appear  to  be  not  well  suited  in  our  context  for  the  following  reasons.  First,  the  convergence  of  the  above 
estimation  algorithms  is  only  asymptotic,  and  it  depends  upon  the  communication  topology.  As  a  matter  of  fact,  for 
many  communication  topologies,  such  as  Cayley  graphs  and  random  geometric  graphs,  the  convergence  rate  is  very 
slow  and  scales  badly  with  the  network  dimension.  Such  slow  convergence  rate  is  clearly  undesirable  because  a  delayed 
state  estimation  could  lead  the  power  plant  to  instability.  Second,  approaches  based  on  Kalman  filtering  require  the 
knowledge  of  the  global  state  and  observation  model  by  all  the  components  of  the  network,  and  they  violate  therefore 
our  assumptions.  Third  and  finally,  the  application  of  these  methods  to  the  detection  of  cyber  attacks,  which  is  also 
our  goal,  is  not  straightforward,  especially  when  detection  guarantees  are  required.  An  exception  is  constituted  by 
[31],  where  a  estimation  technique  based  on  local  Kalman  filters  and  a  consensus  strategy  is  developed.  This  latter 
method,  however,  besides  exhibiting  asymptotic  convergence,  does  not  offer  guarantees  on  the  final  estimation  error. 

Our  estimation  technique  belongs  to  the  family  of  Kaczmarz  (row-projection)  methods  for  the  solution  of  a  linear 
system  of  equations.  See  [13,  11,  32,  6]  for  a  detailed  discussion.  Differently  from  the  existing  row-action  methods, 
our  algorithms  exhibit  finite  time  convergence  towards  the  exact  solution,  and  they  can  be  used  to  compute  any 
weighted  least  squares  solution  to  a  system  of  linear  equations. 

1.3  Our  contributions 

The  contributions  of  this  work  are  threefold.  First,  we  adopt  the  static  state  network  estimation  model,  in  which  the 
state  vector  is  linearly  related  to  the  network  measurements.  We  develop  two  methods  for  a  group  of  interconnected 
control  centers  to  compute  an  optimal  estimate  of  the  system  state  via  distributed  computation.  Our  first  estimation 
algorithm  assumes  an  incremental  mode  of  cooperation  among  the  control  centers,  while  our  second  estimation 
algorithm  is  based  upon  a  diffusive  strategy.  Both  methods  are  shown  to  converge  in  a  finite  number  of  iterations, 
and  to  require  only  local  information  for  their  implementation.  Differently  than  [23],  our  estimation  procedures 
assume  neither  the  measurement  error  covariance  nor  the  measurements  matrix  to  be  diagonal.  Furthermore,  our 
algorithms  are  advantageous  from  a  communication  perspective,  since  they  reduce  the  distance  between  remote 
terminal  units  and  the  associated  control  center,  and  from  a  computational  perspective,  since  they  distribute  the 
measurements  to  be  processed  among  the  control  centers.  Second,  as  a  minor  contribution,  we  describe  a  finite-time 
algorithm  to  detect  via  distributed  computation  if  the  measurements  have  been  corrupted  by  a  malignant  agent.  Our 
detection  method  is  based  upon  our  state  estimation  technique,  and  it  inherits  its  convergence  properties.  Notice  that, 
since  we  assume  the  measurements  to  be  corrupted  by  noise,  the  possibility  exists  for  an  attacker  to  compromise  the 
network  measurements  while  remaining  undetected  (by  injecting  for  instance  a  vector  with  the  same  noise  statistics). 
With  respect  to  this  limitation,  we  characterize  the  class  of  corrupted  vectors  that  are  guaranteed  to  be  detected 
by  our  procedure,  and  we  show  optimality  with  respect  to  a  centralized  detection  algorithm.  Third,  we  study  the 
scalability  of  our  methods  in  networks  of  increasing  dimension,  and  we  derive  a  finite-memory  approximation  of  our 
diffusive  estimation  strategy.  For  this  approximation  procedure  we  show  that,  under  a  reasonable  set  of  assumptions 
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and  independent  of  the  network  dimension,  each  control  center  is  able  to  recover  a  good  approximation  of  the  state 
of  a  certain  subnetwork  through  little  computation.  Moreover,  we  provide  bounds  on  the  approximation  error  for 
each  subnetwork.  Finally,  we  illustrate  the  effectiveness  of  our  procedures  on  the  IEEE  118  bus  system. 


The  rest  of  the  paper  is  organized  as  follows.  In  Section  2  we  introduce  the  problem  under  consideration,  and  we 
describe  the  mathematical  setup.  Section  3  contains  our  main  results  on  the  state  estimation  and  on  the  detection 
problem,  as  well  as  our  algorithms.  Section  4  describes  our  approximated  state  estimation  algorithm.  In  Section  5  we 
study  the  IEEE  118  bus  system,  and  we  present  some  simulation  results.  Finally,  Section  6  contains  our  conclusion. 


2  Problem  setup  and  preliminary  notions 


For  a  power  network,  an  example  of  which  is  reported  in  Fig.  1(b),  the  state  at  a  certain  instant  of  time  consists 
of  the  voltage  angles  and  magnitudes  at  all  the  system  buses.  The  (static)  state  estimation  problem  introduced 
in  the  seminal  work  by  Schweppe  [27]  refers  to  the  procedure  of  estimating  the  state  of  a  power  network  given  a 
set  of  measurements  of  the  network  variables,  such  as,  for  instance,  voltages,  currents,  and  power  flows  along  the 
transmission  lines.  To  be  more  precise,  let  x  £  Rn  and  z  £  Rp  be,  respectively,  the  state  and  measurements  vector. 
Then,  the  vectors  x  and  2  are  related  by  the  relation 

2  =  h(x)  +  77,  (1) 

where  h(-)  is  a  nonlinear  measurement  function,  and  where  77,  which  is  traditionally  assumed  to  be  a  zero  mean 
random  vector  satisfying  E[?7?7T]  =  £,,  =  £^  >  0,  is  the  noise  measurement.  An  optimal  estimate  of  the  network  state 
coincides  with  the  most  likely  vector  x  that  solves  equation  (1).  It  should  be  observed  that,  instead  of  by  solving  the 
above  estimation  problem,  the  network  state  could  be  obtained  by  measuring  directly  the  voltage  phasors  by  means 
of  phasor  measurement  devices. 1  Such  an  approach,  however,  would  be  economically  expensive,  since  it  requires  to 
deploy  a  phasor  measurement  device  at  each  network  bus,  and  it  would  be  very  vulnerable  to  communication  failures 
[1].  In  this  work,  we  adopt  the  approximated  estimation  model  presented  in  [28],  which  follows  from  the  linearization 
around  the  origin  of  equation  (1).  Specifically, 


z  =  Hx  +  v ,  (2) 

where  H  £  Mpxn  and  where  v,  the  noise  measurement,  is  such  that  E[v]  =  0  andA[uuT]  =  £  =  £T  >  0.  Observe 
that,  because  of  the  interconnection  structure  of  a  power  network,  the  measurement  matrix  H  is  usually  sparse.  Let 
Ker (H)  denote  the  null  space  defined  by  the  matrix  H.  For  the  equation  (2),  without  affecting  generality,  assume 
Ker(if)  =  {0},  and  recall  from  [17]  that  the  vector 


crwls  =  (HtE-1H)-1HtE~1z 


(3) 


minimizes  the  weighted  variance  of  the  estimation  error,  i.e. ,  xwis  =  argmin^z  —  Hx)JT,  x( z  —  Hx). 


The  centralized  computation  of  the  minimum  variance  estimate  to  (2)  assumes  the  complete  knowledge  of  the  matrices 
H  and  £,  and  it  requires  the  inversion  of  the  matrix  H .  For  a  large  power  network,  such  computation  imposes 

a  limitation  on  the  dimension  of  the  matrix  H,  and  hence  on  the  number  of  measurements  that  can  be  efficiently 
processed  to  obtain  a  real-time  state  estimate.  Since  the  performance  of  network  control  and  optimization  algorithms 
depend  upon  the  precision  of  the  state  estimate,  a  limitation  on  the  network  measurements  constitutes  a  bottleneck 
toward  the  development  of  a  more  efficient  power  grid.  A  possible  solution  to  address  this  complexity  problem  is 
to  distribute  the  computation  of  ccwis  among  geographically  deployed  control  centers  (monitors),  in  a  way  that  each 
monitor  is  responsible  for  a  subpart  of  the  whole  network.  To  be  more  precise,  let  the  matrices  H  and  £,  and  the 


1  Phasor  measurement  units  are  devices  that  synchronize  by  using  GPS  signals,  and  that  allow  for  a  direct  measurement  of 
voltage  and  current  phasors. 
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vector  2  be  partitioned  as  2 


’  Hx  ' 

"  Si  ' 

Zl 

Hi 

,  E  = 

S2 

,  z  = 

Z2 

_  Hm  _ 

_  _ 

_  %m  _ 

where,  for  i  G  {1  m,  €  N,  Hi  G  RmiXn,  Sj  G  MmiXp,  Zi  G  Rmi,  and  Y^ZLi  m*  =  P-  Let  G  =  (V,£)  be  a 

connected  graph  in  which  each  vertex  i  G  V  =  {1, . . .  ,?n}  denotes  a  monitor,  and  £  £  V  x  V  denotes  the  set  of 
monitors  interconnections.  For  i  G  {1, . . .  ,  ?n},  assume  that  monitor  i  knows  the  matrices  Hi,  and  the  vector  z^. 
Moreover,  assume  that  two  neighboring  monitors  are  allowed  to  cooperate  by  exchanging  information.  Notice  that,  if 
the  full  matrices  H  and  S  are  nowhere  available,  and  if  they  cannot  be  used  for  the  computation  of  a;wis,  then,  with 
no  cooperation  among  the  monitors,  the  vector  a;wis  cannot  be  computed  by  any  of  the  monitor.  Hence  we  consider 
the  following  problem. 


Problem  1  (Distributed  state  estimation)  Design  an  algorithm  for  the  monitors  to  compute  the  minimum  vari¬ 
ance  estimate  of  the  network  state  via  distributed  computation. 


We  now  introduce  the  second  problem  addressed  in  this  work.  Given  the  distributed  nature  of  a  power  system  and 
the  increasing  reliance  on  local  area  networks  to  transmit  data  to  a  control  center,  there  exists  the  possibility  for  an 
attacker  to  compromise  the  network  functionality  by  corrupting  the  measurements  vector.  When  a  malignant  agent 
corrupts  some  of  the  measurements,  the  state  to  measurements  relation  becomes 

z  =  Hx  +  v  +  w, 

where  the  vector  w  G  Rp  is  chosen  by  the  attacker,  and,  therefore,  it  is  unknown  and  unmeasurable  by  any  of  the 
monitoring  stations.  We  refer  to  the  vector  w  to  as  false  data.  From  the  above  equation,  it  should  be  observed  that 
there  exist  vectors  w  that  cannot  be  detected  through  the  measurements  z.  For  instance,  if  the  false  data  vector  is 
intentionally  chosen  such  that  w  G  Im(H),  then  the  attack  cannot  be  detected  through  the  measurements  z.  Indeed, 
denoting  with  1  the  pseudoinverse  operation,  the  vector  x  +  H'w  is  a  valid  network  state.  In  this  work,  we  assume 
that  the  vector  w  is  detectable  from  the  measurements  z,  and  we  consider  the  following  problem. 

Problem  2  (Distributed  detection)  Design  an  algorithm  for  the  monitors  to  detect  the  presence  of  false  data  in 
the  measurements  via  distributed  computation. 


As  it  will  be  clear  in  the  sequel,  the  complexity  of  our  methods  depends  upon  the  dimension  of  the  state,  as  well 
as  the  number  of  monitors.  In  particular,  few  monitors  should  be  used  in  the  absence  of  severe  computation  and 
communication  contraints,  while  many  monitors  are  preferred  otherwise.  We  believe  that  a  suitable  choice  of  the 
number  of  monitors  depends  upon  the  specific  scenario,  and  it  is  not  further  discussed  in  this  work. 

Remark  1  (Generality  of  our  methods)  In  this  paper  we  focus  on  the  state  estimation  and  the  false  data  detec¬ 
tion  problem  for  power  systems,  because  this  field  of  research  is  currently  receiving  sensible  attention  from  different 
communities.  The  methods  described  in  the  following  sections,  however,  are  general,  and  they  have  applicability 
beyond  the  power  network  scenario.  For  instance,  our  procedures  can  be  used  for  state  estimation  and  false  data 
detection  in  dynamical  system,  as  described  in  [21]  for  the  case  of  sensors  networks. 


3  Optimal  state  estimation  and  false  data  detection  via  distributed  computation 

The  objective  of  this  section  is  the  design  of  distributed  methods  to  compute  an  optimal  state  estimate  from 
measurements.  With  respect  to  a  centralized  method,  in  which  a  powerful  central  processor  is  in  charge  of  processing 


2  In  most  application  the  error  covariance  matrix  is  assumed  to  be  diagonal,  so  that  each  submatrix  E;  is  very  sparse. 
However,  we  do  not  impose  any  particular  structure  on  the  error  covariance  matrix. 
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Algorithm  1  Incremental  minimum  norm  solution  (i-th  monitor) 

Input:  Hi,  Zi ; 

Require:  [zj  . . .  z ^]T  G  Im ([Hj  . . .  #^]T); 
if  i  =  1  then 

x0  :=  0,  K0  :=  In; 

else 

receive  Xi— i  and  A',_i  from  monitor  i  —  1; 

end  if 

:=  *»-i  +  (zi  -  HiXi-i ); 

A",  :=  Basis(ATi„iKer(RiAri_i)); 

if  i  <  m  then 

transmit  a;,;  and  A',;  to  monitor  i  +  1; 
else 

return  xm; 
end  if 


all  the  data,  our  procedures  require  the  computing  units  to  have  access  to  only  a  subset  of  the  measurements,  and 
are  shown  to  reduce  significantly  the  computational  burden.  In  addition  to  being  convenient  for  the  implementation, 
our  methods  are  also  optimal,  in  the  sense  that  they  maintain  the  same  estimation  accuracy  of  a  centralized  method. 

For  a  distributed  method  to  be  implemented,  the  interaction  structure  among  the  computing  units  needs  to  be 
defined.  Here  we  consider  two  modes  of  cooperations  among  the  computing  units,  and,  namely,  the  incremental  and 
the  diffusive  interaction.  In  an  incremental  mode  of  cooperation,  information  flows  in  a  sequential  manner  from  one 
node  to  the  adjacent  one.  This  setting,  which  usually  requires  the  least  amount  of  communications  [22],  induces  a 
cyclic  interaction  graph  among  the  processors.  In  a  diffusive  strategy,  instead,  each  node  exchanges  information  with 
all  (or  a  subset  of)  its  neighbors  as  defined  by  an  interaction  graph.  In  this  case,  the  amount  of  communication  and 
computation  is  higher  than  in  the  incremental  case,  but  each  node  possesses  a  good  estimate  before  the  termination 
of  the  algorithm,  since  it  improves  its  estimate  at  each  communication  round.  This  section  is  divided  into  three  parts. 
In  Section  3.2,  we  first  develop  a  distributed  incremental  method  to  compute  the  minimum  norm  solution  to  a  set 
of  linear  equations,  and  then  exploit  such  method  to  solve  a  minimum  variance  estimation  problem.  In  Section  3.3 
we  derive  a  diffusive  strategy  which  is  amenable  to  asynchronous  implementation.  Finally,  in  Section  3.4  we  propose 
a  distributed  algorithm  for  the  detection  of  false  data  among  the  measurements.  Our  detection  procedure  requires 
the  computation  of  the  minimum  variance  state  estimate,  for  which  either  the  incremental  or  the  diffusive  strategy 
can  be  used. 

3.1  Incremental  solution  to  a  set  of  linear  equations 

We  start  by  introducing  a  distributed  incremental  procedure  to  compute  the  minimum  norm  solution  to  a  set  of 
linear  equations.  This  procedure  constitutes  the  key  ingredient  of  the  incremental  method  we  later  propose  to  solve 
the  minimum  variance  estimation  problem.  Let  H  G  W>xm,  and  let  2  G  Im(if),  where  Im(IL)  denotes  the  range  space 
spanned  by  the  matrix  H .  Consider  the  system  of  linear  equations  2  =  Hx,  and  recall  that  the  unique  minimum 
norm  solution  to  z  =  Hx  coincides  with  the  vector  x  such  that  z  =  Hx  and  ||i||2  is  minimum.  It  can  be  shown 
that  ||x||2  being  minimum  corresponds  to  x  being  orthogonal  to  the  null  space  Ker (H)  of  H  [17].  Let  H  and  z  be 
partitioned  in  m  blocks  as  in  (4),  and  let  G  =  (V,£)  be  a  directed  graph  such  that  V  =  {1, . . . ,  m}  corresponds  to  the 
set  of  monitors,  and,  denoting  with  (i,j)  the  directed  edge  from  j  to  i,  £  =  {(i  +  1,  i)  :  i  =  1, . . . ,  m  —  1}  U  {(l,m)}. 
Our  incremental  procedure  to  compute  the  minimum  norm  solution  to  z  =  Hx  is  in  Algorithm  1,  where,  given  a 
subspace  V,  we  write  Basis(V)  to  denote  any  full  rank  matrix  whose  columns  span  the  subspace  V.  We  now  proceed 
with  the  analysis  of  the  convergence  properties  of  the  Incremental  minimum  norm  solution  algorithm. 

Theorem  3.1  (Convergence  of  Algorithm  1)  Let  z  =  Hx,  where  H  and  z  are  partitioned  in  m  row-blocks  as  in  (4). 
In  Algorithm  1,  the  m-th  monitor  returns  the  vector  x  such  that  z  =  Hx  and  x  _L  Ker (H). 


PROOF.  See  Section  6.1. 


It  should  be  observed  that  the  dimension  of  A',;  decreases,  in  general,  when  the  index  i  increases.  In  particular, 
Km  =  {0}  and  Ki  =  Ker(Hi).  To  reduce  the  computational  burden  of  the  algorithm,  monitor  i  could  transmit  the 
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smallest  among  Basis(Aj_i  Ker(HjAj_i))  and  Basis(Aj_i  Ker(.ffj.Kj_i)-L),  together  with  a  packet  containing  the 
type  of  the  transmitted  basis. 

Remark  2  (Computational  complexity  of  Algorithm  1)  In  Algorithm  1,  the  main  operation  to  be  performed 
by  the  i-th  agent  is  a  singular  value  decomposition  (SVD).3  Indeed,  since  the  range  space  and  the  mdl  space  of  a 
matrix  can  be  obtained  through  its  SVD,  both  the  matrices  i)t  and  Basis(A'j_i  Ker(i/j.fQ_i))  can  be  recovered 

from  the  SVD  of  Let  H  £  Rmxra,  m  >  n,  and  assume  the  presence  of  \m/k]  monitors,  1  <  k  <  m.  Recall 

that,  for  a  matrix  M  £  Rkxp,  the  singidar  value  decomposition  can  be  performed  with  complexity  0(mm{kp2 ,  k2p}) 
[10].  Hence,  the  computational  complexity  of  computing  a  minimum  norm  solution  to  the  system  z  =  Hx  is  0(mn2). 
In  Table  1  we  report  the  computational  complexity  of  Algorithm  1  as  a  function  of  the  size  k. 

Table  1 

Computational  complexity  of  Algorithm  1. 


Block  size 

7-th  complexity 

Total  complexity 

Communications 

k  <  n 

0{k2n ) 

Ofmkn ) 

\m/k~\  —  1 

k  >  n 

0(kn2) 

Q(mn2) 

\m/k~\  —  1 

The  following  observations  are  in  order.  First,  if  k  <  n,  then  the  computational  complexity  sustained  by  the  i-th 
monitor  is  much  smaller  than  the  complexity  of  a  centralized  implementation,  i.e.,  0(k2n)  <C  0(mn2).  Second,  the 
complexity  of  the  entire  algorithm  is  optimal,  since,  in  the  worst  case,  it  maintains  the  computational  complexity  of 
a  centralized  solution,  i.e.,  Olinkn )  <  0(mn2).  Third  and  finally,  a  compromise  exists  between  the  blocks  size  k  and 
the  number  of  communications  needed  to  terminate  Algorithm  1.  In  particular,  if  k  =  m,  then  no  communication  is 
needed,  while,  if  k  =  1,  then  m  —  1  communication  rounds  are  necessary  to  terminate  the  estimation  algorithm.  4 

3.2  Incremental  state  estimation  via  distribided  computation 

We  now  focus  on  the  computation  of  the  weighted  least  squares  solution  to  a  set  of  linear  equations.  Let  v  be  an 
unknown  and  unmeasurable  random  vector,  with  E(w)  =  0  and  E(iwt)  =  E  =  ET  >  0.  Consider  the  system  of 
equations 


z  =  Hx  +  v ,  (5) 

and  assume  Ker (77)  =  0.  Notice  that,  because  of  the  noise  vector  v,  we  generally  have  z  Im (77),  so  that  Algorithm 
1  cannot  be  directly  employed  to  compute  the  vector  xwis  defined  in  (3).  It  is  possible,  however,  to  recast  the  above 
weighted  least  squares  estimation  problem  to  be  solvable  with  Algorithm  1.  Note  that,  because  the  matrix  E  is 
symmetric  and  positive  definite,  there  exists5  a  full  row  rank  matrix  B  such  that  E  =  BBT .  Then,  equation  (5)  can 
be  rewritten  as 


H  eB 

X 

V 

(6) 


where  £  £  M>0,  E[u]  =  0  and  E[iwt]  =  e  2I.  Observe  that,  because  B  has  full  row  rank,  the  system  (6)  is 
underdetermined,  i.e.,  2  £  Im([77  eB])  and  Ker([77  eB])  ^  0.  Let 

x{e) 
v 


it 

H  eB  z. 


(7) 


The  following  theorem  characterizes  the  relation  between  the  minimum  variance  estimation  xwis  and  x{e). 

3  The  matrix  H  is  usually  very  sparse,  since  it  reflects  the  network  interconnection  structure.  Efficient  SVD  algorithms  for 
very  large  sparse  matrices  are  being  developed  (cf.  SVDPACK). 

4  Additional  m  —  1  communication  rounds  are  needed  to  transmit  the  estimation  to  every  other  monitor. 

5  Choose  for  instance  B  =  W A1'2 ,  where  IT  is  a  basis  of  eigenvectors  of  E  and  A  is  the  corresponding  diagonal  matrix  of  the 
eigenvalues. 
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Theorem  3.2  (Convergence  with  e)  Consider  the  system  of  linear  equations  z  =  Hx  +  v.  Let  E(c)  =  0  and 
E(ui>t)  =  £  =  BBt  >  0  for  a  fidl  row  rank  matrix  B.  Let 

C  =  e{I  -  HH^)B,  E  =  I-C]C,  D  =  eE[I  +  e2EBt(HHt)'iBE]-1Bt(HHt^(I  -  eBC'1'). 


Then 


and 


r  it 

'  i?t_  £jfft£(c't  +  Dy 

H  eB  = 

C7t  +  D 

lim  H f  -  eH^  B(C^  +D)  =  ( HJ  Y,-1  H)~l  HJ  \ 


PROOF.  See  Section  6.2. 


Throughout  the  paper,  let  x{e)  be  the  vector  defined  in  (7),  and  notice  that  Theorem  3.2  implies  that 

ccwis  =  lim  x(e). 

6^0  + 


Remark  3  (Incremental  state  estimation)  For  the  system  of  equations  z  =  Hx  +  v,  let  BBJ  be  the  covariance 
matrix  of  the  noise  vector  v,  and  let 


- 1 

tnj 

Zi 

h2 

,  B  = 

b2 

,  z  = 

Z2 

_  H  rn  _ 

_  B  rn  _ 

_  _ 

where  €  N,  Hi  €  RmiXra,  Bi  G  RmiXp,  and  Zi  €  Rmi.  For  s  >  0,  the  estimate  x(e)  of  the  weighted  least  squares 
solution  to  z  =  Hx  +  v  can  be  computed  by  means  of  Algorithm  1  with  input  [Hi  eBf\  and  Zi . 


Observe  now  that  the  estimate  x(s)  coincides  with  ccwis  only  in  the  limit  for  e  — ■>  0+.  When  the  parameter  e  is  fixed, 
the  estimate  x(e)  differs  from  the  minimum  variance  estimate  :ew1s-  We  next  characterize  the  approximation  error 

•^wls  *^(^0  • 


Corollary  3.1  (Approximation  error)  Consider  the  system  z  =  Hx  +  v,  and  let  E[uvT]  =  BBJ  for  a  full  row  rank 
matrix  B.  Then 


xw\s  —  x(e)  =  eH^  BDz, 


where  D  is  as  in  Theorem  3.2. 


PROOF.  With  the  same  notation  as  in  the  proof  of  Theorem  3.2,  for  every  value  of  £  >  0,  the  difference  xwis  —  x(e) 
equals  ((i7TE-1iJ)-1RTE-1  -  W  +  eH^ B{C^  +  D))  z.  Since  (. HTY,-1H)-1HTY, -  H'  +  eH^BC^  =  0  for  every 
£  >  0,  it  follows  £wis  —  x(e)  =  eH^BDz. 


Therefore,  for  the  solution  of  system  (5)  by  means  of  Algorithm  1,  the  parameter  £  is  chosen  according  to  Corollary 
3.1  to  meet  a  desired  estimation  accuracy.  It  should  be  observed  that,  even  if  the  entire  matrix  H  needs  to  be  known 


for  the  computation  of  the  exact  parameter  e,  the  advantages  of  our  estimation  technique  are  preserved.  Indeed,  if 
the  matrix  H  is  unknown  and  an  upper  bound  for  ||ift.B.D2||  is  known,  then  a  value  for  e  can  still  be  computed  that 
guarantees  the  desired  estimation  accuracy.  On  the  other  hand,  even  if  H  is  entirely  known,  it  may  be  inefficient 
to  use  H  to  perform  a  centralized  state  estimation  over  time.  Instead,  the  parameter  e  needs  to  be  computed  only 
once.  To  conclude  this  section,  we  characterize  the  estimation  residual  z  —  Hx.  This  quantity  plays  an  important 
role  for  the  synthesis  of  a  distributed  false  data  detection  algorithm. 

Corollary  3.2  (Estimation  residual)  Consider  the  system  z  =  Hx  +  v,  and  let  E[ui>T]  =  E  =  ET  >  0.  Then 6 

lim  \\z-Hx(e)\\<\\(I-HW)\\\\v\\, 

£—>0  + 


where  W  =  (RTE_1fL)_1.ffTE_1. 


PROOF.  By  virtue  of  Theorem  3.2  we  have  lim£_>0+  %(£)  =  ®wis  =  (HJY±  1H)  1HtYi  1z  =  Wz.  Observe  that 
HWH  =  H ,  and  recall  that  2  =  Hx  +  v.  For  any  matrix  norm,  we  have 

\\z-Hxw1a\\  =  \\z-HWz\\  =  \\(I-HW)(Hx  +  v)\\  =  \\Hx  -  Hx  +  (I  -  HW)v ||  <  \\(I  -  HW)\\\\v\\, 

and  the  theorem  follows. 


3.3  Diffusive  state  estimation  via  distributed  computation 

The  implementation  of  the  incremental  state  estimation  algorithm  described  in  Section  3.2  requires  a  certain  degree 
of  coordination  among  the  control  centers.  For  instance,  an  ordering  of  the  monitors  is  necessary,  such  that  the  z-th 
monitor  transmits  its  estimate  to  the  (z  +  l)-th  monitor.  This  requirement  imposes  a  constraint  on  the  monitors 
interconnection  structure,  which  may  be  undesirable,  and,  potentially,  less  robust  to  link  failures.  In  this  section,  we 
overcome  this  limitation  by  presenting  a  diffusive  implementation  of  Algorithm  1,  which  only  requires  the  monitors 
interconnection  structure  to  be  connected.  '  To  be  more  precise,  let  V  =  {1, . . .  ,in}  be  the  set  of  monitors,  and 
let  G  =  (V,  E )  be  the  undirected  graph  describing  the  monitors  interconnection  structure,  where  E  C  V  x  V,  and 
(i,  j)  €  E  if  and  only  if  the  monitors  i  and  j  are  connected.  The  neighbor  set  of  node  i  is  defined  as  iVj  =  {j  £  V  : 
(i,j)  £  E}.  We  assume  that  G  is  connected,  and  we  let  the  distance  between  two  vertices  be  the  minimum  number  of 
edges  in  a  path  connecting  them.  Finally,  the  diameter  of  a  graph  G,  in  short  diam(G),  equals  the  greatest  distance 
between  any  pair  of  vertices.  Our  diffusive  procedure  is  described  in  Algorithm  2,  where  the  matrices  Hi  and  eBi 
are  as  defined  in  equation  (8).  During  the  h- th  iteration  of  the  algorithm,  monitor  i,  with  i  £  {1, . . . ,  N},  performs 
the  following  three  actions  in  order: 

(i)  transmits  its  current  estimates  Xi  and  Ki  to  all  its  neighbors; 

(ii)  receives  the  estimates  Xj  from  neighbors  Np,  and 

(iii)  updates  x:-,  and  K,  as  in  the  for  loop  of  Algorithm  2. 

We  next  show  the  convergence  of  Algorithm  2  to  the  minimum  variance  estimate. 

Theorem  3.3  (Convergence  of  Algorithm  2)  Consider  the  system  of  linear  equations  z  =  Hx+v,  where  E[v]  = 
0  and  E[zn<T]  =  BBJ .  Let  H ,  B  and  z  be  partitioned  as  in  (8),  and  let  e  >  0.  Let  the  monitors  communication  graph 
be  connected,  let  d  be  its  diameter,  and  let  the  monitors  execute  the  Diffusive  state  estimation  algorithm.  Then,  each 
monitor  computes  the  estimate  x(e)  of  x  in  d  steps. 


PROOF.  Let  Xi  be  the  estimate  of  the  monitor  i,  and  let  Kt  be  such  that  x  —  Xi  £  Im(A^),  where  x  denotes  the 
network  state,  and  Xi  _L  Im(A’f).  Notice  that  Zi  =  [Hi  eBi\Xi,  where  Z{  it  the  z-th  measurements  vector.  Let  z  and  j 


6  Given  a  vector  v  and  a  matrix  H ,  we  denote  by  ||v||  any  vector  norm,  and  by  ||A||  the  corresponding  induced  matrix  norm. 
'  An  undirected  graph  is  said  to  be  connected  if  there  exists  a  path  between  any  two  vertices  [9]. 
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Algorithm  2  Diffusive  state  estimation  (i-th  monitor) 
Input:  Hi,  -If.  Zi\ 

Xi  •  [Hi  ^Bj)^  Zj, 

Ki  :=  Basis  (Ker^H)  eS*])); 

while  Ki  ^  0  do 
for  j  £  Ni  do 

receive  Xj  and  Kj\ 

Xi  .  Xi  T  [Ki  0]  [  I\-i  Kj]^  (x^  "Vj)} 

Ki  :=  Basis(Im(A.j)  nlm(ATj)); 

end  for 

transmit  Xi  and  K, : 

end  while 


be  two  neighboring  monitors.  Notice  that  there  exist  vectors  v,  and  Vj  such  that  Xi  +  KiVi  =  Xj  +  KjVj.  In  particular, 
those  vectors  can  be  chosen  as 


=  [~Ki  Kf]\xi  -  xj). 


It  follows  that  the  vector 


xf  =  Xi  +  [I<i  0][— A)  Kj]\xi  -  xj) 

is  such  that  Zi  =  [Hi  eB.^xf  and  Zj  =  [Hj  eBj]xf.  Moreover  we  have  xf  T  (Im(Aj)  nlm(Aj)).  Indeed,  notice  that 


Vi 


l  Ker {{-Ki  Kj])  D 


Wi 


L^J 


:  KjWi  =  KjWj 


We  now  show  that  KiV,  _L  Im(Kj).  By  contradiction,  if  KjVj  )L  Im(/\j),  then  Uj  =  +  Vi,  with  Kfiji  _L  Im(A'j) 

and  KiVi  £  lm(Kj).  Let  Vj  =  KjKiVi,  and  Vj  =  Vj  —  Vj.  Then,  [vj  uJ]T  £  Ker([— Ki  Kj ]),  and  hence  [vj  uJ]T  / 
Ker([— Ki  Kj]),  which  contradicts  the  hypothesis.  We  conclude  that  [Kt  0][— K  Kj]^ (x,  —  Xj)  _L  Im(Aj),  and, 
since  Xi  _L  Im(A'i),  it  follows  xf  _L  (Im(Aj)  n  Im(A,)).  The  theorem  follows  from  the  fact  that  after  a  number  of 
steps  equal  to  the  diameter  of  the  monitors  communication  graph,  each  vector  Xi  verifies  all  the  measurements,  and 
Xi  T  Im(A'!)  n  •  •  •  n  im(Arm). 


As  a  consequence  of  Theorem  3.2,  in  the  limit  for  e  to  zero,  Algorithm  2  returns  the  minimum  variance  estimate 
of  the  state  vector,  being  therefore  the  diffusive  counterpart  of  Algorithm  1.  A  detailed  comparison  between  incre¬ 
mental  and  diffusive  methods  is  beyond  the  purpose  of  this  work,  and  we  refer  the  interested  reader  to  [15,  16]  and 
the  references  therein  for  a  thorough  discussion.  Here  we  only  underline  some  key  differences.  While  Algorithm  1 
requires  less  operations,  being  therefore  computationally  more  efficient,  Algorithm  2  does  not  constraint  the  monitors 
communication  graph.  Additionally,  Algorithm  2  can  be  implemented  adopting  general  asynchronous  communica¬ 
tion  protocols.  For  instance,  consider  the  Asynchronous  (diffusive)  state  estimation  algorithm,  where,  at  any  given 
instant  of  time  in  N,  at  most  one  monitor,  say  j,  sends  its  current  estimates  to  its  neighbors,  and  where,  for  i  £  Nj, 
monitor  i  performs  the  following  operations: 

(i)  Xi  . —  Xi  T  [Ki  0]  [  Ki  K j]^ (xi  *Vj)i 

(ii)  I<i  :=  Basis(Im(A'j)  Dlm(A'j)). 

Corollary  3.3  (Asynchronous  estimation)  Consider  the  system  of  linear  equations  z  =  Hx  +  v,  where  E[v]  =  0 
and  E[uvT]  =  BBJ .  Let  H,  B  and  z  he  partitioned  as  in  (8),  and  let  e  >  0.  Let  the  monitors  communication  graph  be 
connected,  let  d  he  its  diameter,  and  let  the  monitors  execute  the  Asynchronous  (diffusive)  state  estimation  algorithm. 
Assume  that  there  exists  a  duration  T  £  N  such  that,  within  each  time  interval  of  duration  T,  each  monitor  transmits 
its  current  estimates  to  its  neighbors.  Then,  each  monitor  computes  the  estimate  x(e)  of  x  within  time  dT. 
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Algorithm  3  False  data  detection  (i-th  monitor) 
Input:  Hi,  sBi,  T; 
while  True  do 

collect  measurements  Zi(t)\ 

estimate  network  state  x(t)  via  Algorithm  1  or  2 
if  || Zi(t)  -  Hi,x(t) ||oo  >  r  then 
return  False  data  detected; 

end  if 
end  while 


PROOF.  The  proof  follows  from  the  following  two  facts.  First,  the  intersection  of  subspaces  is  a  commutative 
operation.  Second,  since  each  monitor  performs  a  data  transmission  within  any  time  interval  of  length  T,  it  follows 
that,  at  time  d.T,  the  information  related  to  one  monitor  has  propagated  through  the  network  to  every  other  monitor. 


3-4  Detection  of  false  data  via  distributed  computation 


In  the  previous  sections  we  have  shown  how  to  compute  an  optimal  state  estimate  via  distributed  computation.  A 
rather  straightforward  application  of  the  proposed  state  estimation  technique  is  the  detection  of  false  data  among 
the  measurements.  When  the  measurements  are  corrupted,  the  state  to  measurements  relation  becomes 

z  =  Hx  +  v  +  w, 

where  w  is  the  false  data  vector.  As  a  consequence  of  Corollary  3.2,  the  vector  w  is  detectable  if  it  affects  significantly 
the  estimation  residual,  i.e. ,  if  lime^o  || z  —  Hx(e)\\  >  T,  where  the  threshold  T  depends  upon  the  magnitude  of  the 
noise  v.  Notice  that,  because  false  data  can  be  injected  at  any  time  by  a  malignant  agent,  the  detection  algorithm 
needs  to  be  executed  over  time  by  the  control  centers.  Let  z(t)  be  the  measurements  vector  at  a  given  time  instant 
t,  and  let  E[2(ti)zT(t2)]  =  0  for  all  t\  ^  t2.  Based  on  this  considerations,  our  distributed  detection  procedure  is  in 
Algorithm  3,  where  the  matrices  Hi  and  eBi  are  as  defined  in  equation  (8),  and  T  is  a  predefined  threshold. 

In  Algorithm  3,  the  value  of  the  threshold  T  determines  the  false  alarm  and  the  misdetection  rate.  Clearly,  if 
r  >  ||  (J  —  ffkF)||||u(f)||  and  e  is  sufficiently  small,  then  no  false  alarm  is  triggered,  at  the  expenses  of  the  misdetection 
rate.  By  decreasing  the  value  of  T  the  sensitivity  to  failures  increases  together  with  the  false  alarm  rate.  Notice  that,  if 
the  magnitude  of  the  noise  signals  is  bounded  by  7,  then  a  reasonable  choice  of  the  threshold  is  T  =  -y ||  (Z  —  HW)\\oo, 
where  the  use  of  the  infinity  norm  in  Algorithm  3  is  also  convenient  for  the  implementation.  Indeed,  since  the 
condition  \\z(t)  —  R'x(t)||00  >  T  is  equivalent  to  || Zi(t)  —  #»&(£)  ||oo  >  B  for  some  monitor  i,  the  presence  of  false  data 
can  be  independently  checked  by  each  monitor  without  further  computation.  Notice  that  an  eventual  alarm  message 
needs  to  be  propagated  to  all  other  control  centers. 


Remark  4  (Statistical  detection)  A  different  strategy  for  the  detection  of  false  data  relies  on  statistical  tech¬ 
niques,  e.g.,  see  [1].  In  the  interest  of  brevity,  we  do  not  consider  these  methods,  and  we  only  remark  that,  once  the 
estimation  residual  has  been  computed  by  each  monitor,  the  implementation  of  a  (distributed)  statistical  procedure, 
such  as,  for  instance,  the  (distributed)  x2~Test,  is  a  straightforward  task. 


4  A  finite-memory  estimation  technique 


The  procedure  described  in  Algorithm  1  allows  each  agent  to  compute  an  optimal  estimate  of  the  whole  network 
state  in  finite  time.  In  this  section,  we  allow  each  agent  to  handle  only  local,  i.e.,  of  small  dimension,  vectors,  and 
we  develop  a  procedure  to  recover  an  estimate  of  only  a  certain  subnetwork.  We  envision  that  the  knowledge  of  only 
a  subnetwork  may  be  sufficient  to  implement  distributed  estimation  and  control  strategies. 


We  start  by  introducing  the  necessary  notation.  Let  the  measurements  matrix  H  be  partitioned  into  m2,  being  m 
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the  number  of  monitors  in  the  network,  blocks  as 


H  = 


Hu  ■  ■  ■  H im 


Hml  '  '  '  Hn 


(9) 


where  H,j  £  RmiXni  for  all  i,j  £  {1, . . .  ,?n}.  The  above  partitioning  reflects  a  division  of  the  whole  network  into 
competence  regions:  we  let  each  monitor  be  responsible  for  the  correct  functionality  of  the  subnetwork  defined  by  its 
blocks.  Additionally,  we  assume  that  the  union  of  the  different  regions  covers  the  whole  network,  and  that  different 
competence  regions  may  overlap.  Observe  that,  in  most  of  the  practical  situations,  the  matrix  H  has  a  sparse 
structure,  so  that  many  blocks  Hij  have  only  zero  entries.  We  associate  an  undirected  graph  Gh  with  the  matrix  H , 
in  a  way  that  Gh  reflects  the  interconnection  structure  of  the  blocks  Htj .  To  be  more  precise,  we  let  Gh  =  ( 14 , 
where  14  =  {1, ,  m}  denotes  the  set  of  monitors,  and  where,  denoting  by  (i,j)  the  undirected  edge  from  j  to  i,  it 
holds  (i,j)  £  £h  if  and  only  if  \\Hij\\  4  0  or  \\Hji\\  yf  0.  Noticed  that  the  structure  of  the  graph  Gh,  which  reflects 
the  sparsity  structure  of  the  measurement  matrix,  describes  also  the  monitors  interconnections.  By  using  the  same 
partitioning  as  in  (9),  the  Moore-Penrose  pseudoinverse  of  H  can  be  written  as 


if*  =  H  = 


Hi 


H 


ml 


Hi 


I  In 


(10) 


where  H,:l  £  M.niXmi.  Assume  that  H  has  full  row  rank,  8  and  observe  that  =  HT{HHT)  1.  Consider  the 
equation  z  =  Hx,  and  let  W z  =  x  =  [£’7  ■  • .  where,  for  all  i  £  {1, . . . ,  m},  4;  G  KHi.  We  employ  Algorithm  2 
for  the  computation  of  the  vector  x,  and  we  let 


x^h) 


x 


(i,h) 

m 


be  the  estimate  vector  of  the  i-th  monitor  after  h  iterations  of  Algorithm  2,  i.e.,  after  h  executions  of  the  while  loop 
in  Algorithm  2.  In  what  follows,  we  will  show  that,  for  a  sufficiently  sparse  matrix  H ,  the  error  \\x.i  —  x^’^W  has  an 
exponential  decay  when  h  increases,  so  that  it  becomes  negligible  before  the  termination  of  Algorithm  2,  i.e.,  when 
h  <  diam(Gfc,).  The  main  result  of  this  section  is  next  stated. 

Theorem  4.1  (Local  estimation)  Let  the  full-row  rank  matrix  H  be  partitioned  as  in  (9).  Let  [a,b\,  with  a  <  b, 
be  the  smallest  interval  containing  the  spectrum  of  HHT .  Then,  for  i  £  {1, . . .  ,m}  and  h  £  N,  there  exists  C  £  M>o 
and  q  £  (0, 1)  such  that 


\Xj  —  x 


(*A)  i 


<  Cq 


r+l 


Before  proving  the  above  result,  for  the  readers  convenience,  we  recall  the  following  definitions  and  results.  Given 
an  invertible  matrix  M  of  dimension  n,  let  us  define  the  support  sets 

h 

Sh(M)=  \J{(i,j):Mk(i,j)^0}, 

k= 0 

being  Mk(i,j)  the  th  entry  of  Mk,  and  the  decay  sets 

Dh{M)  =  ({1,  ■  •  •  ,n}  x  {1, . . . ,  n})  \  Sh(M). 

8  The  case  of  a  full-column  rank  matrix  is  treated  analogously. 
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Theorem  4.2  (Decay  rate  [7])  Let  M  be  of  full  row  rank,  and  let  [a,  b\,  with  a  <  b,  be  the  smallest  interval 
containing  the  spectrum  of  M.  There  exist  C  €  R>o  and  q  G  (0,1)  such  that 

sup{|Mt(»,  j)|  :  (t,  j)  G  Dh(MMT)}  <  Cqh+1. 


For  a  graph  Gh  and  two  nodes  i  and  j,  let  dist  (i,  j)  denote  the  smallest  number  of  edges  in  a  path  from  j  to  i  in  Gh  ■  The 
following  result  will  be  used  to  prove  Theorem  4.1.  Recall  that,  for  a  matrix  M,  we  have  ||M||max  =  max{|M(i,  j)|}. 

Lemma  4.1  (Decay  sets  and  local  neighborhood)  Let  the  matrix  H  be  partitioned  as  in  (9)  ,  and  let  Gh  be  the 
graph  associated  with  H.  For  i,j  G  {1, . . .  ,  to},  z/dist(i,  j)  =  h,  then 

lKl|max<C'^+1. 

PROOF.  The  proof  can  be  done  by  simple  inspection,  and  it  is  omitted  here. 


Lemma  4.1  establishes  a  relationship  between  the  decay  sets  of  an  invertible  matrix  and  the  distance  among  the 
vertices  of  a  graph  associated  with  the  same  matrix.  By  using  this  result,  we  are  now  ready  to  prove  Theorem  4.1. 


PROOF.  [Proof  of  Theorem  4.1]  Notice  that,  after  h  iterations  of  Algorithm  2,  the  i-th  monitor  has  received  data 
from  the  monitors  within  distance  h  from  i,  i.e.,  from  the  monitors  T  such  that,  for  each  j  G  T,  there  exists  a  path 
of  length  up  to  h  from  j  to  i  in  the  graph  associated  with  H.  Reorder  the  rows  of  H  such  that  the  i-th  block  come 
first  and  the  T-th  blocks  second.  Let  H  =  [Hj  Hf  Pj]T  be  the  resulting  matrix.  Accordingly,  let  2  =  [zj  zj  zJ]T, 
and  let  x  =  [xf  xj  a;J]T,  where  z  =  Hx. 

Because  H  has  full  row  rank,  we  have 

Pn  Pi  2  P13  h  0  0  Pn  P\  2  P13 

P‘2\  P22  P23  =  0  I2  0  ,  =  P21  P22  P23  1 

P31  P32  P33  OO/3  P31  P32  P33 

where  I\,  I2 ,  and  I3  are  identity  matrices  of  appropriate  dimension. For  a  matrix  M,  let  col(M)  denote  the  number 
of  columns  of  M.  Let  Ti  =  {1, . . . , col(Pn)},  T2  =  {1  +  col(Pn), . . . , col([Pn  P12])},  and 

T3  =  {1  +  col([Pn  P12]), . . .  ,col([Pu  P12  Pis])}- 

Let  Ti,  T2,  and  P3,  be,  respectively,  the  indices  of  the  columns  of  Pn,  P12,  and  P13.  Notice  that,  by  construction,  if 
i  G  Pi  and  j  G  X3,  then  dist(i,  j)  >  h.  Then,  by  virtue  of  Lemma  4.1  and  Theorem  4.2,  the  magnitude  of  each  entry 
of  P13  is  bounded  by  Cq^i J+1,  for  C,gGR. 

Because  H  has  full  row  rank,  from  Theorem  3.1  we  have  that 

x  =  H^z  =  x  +  K1(H3Ki)1<(z3  -  H3X1),  (11) 

where 

x  =  [Hj  z1Y  an(l  Ki  =  Basis(Ker([P]r  i?J]T)). 

With  the  same  partitioning  as  before,  let  x  =  [a:}  xj  s}]T.  In  order  to  prove  the  theorem,  we  need  to  show  that  there 
exists  C  G  R>o  and  q  G  (0, 1)  such  that 

II*!  -*i||  <  Cq^ J+1. 
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Fig.  2.  In  Fig.  2(a),  the  normalized  euclidean  norm  of  the  error  vector  #bus(e) —  #bus,wis  is  plotted  as  a  function  of  the  parameter 
e,  where  0bus(e)  is  the  estimation  vector  computed  according  to  Theorem  3.2,  and  #bus,wis  is  the  minimum  variance  estimate 
of  #buS.  As  £  decreases,  the  vector  #bus(e)  converges  to  the  minimum  variance  estimate  #bus,wis-  In  Fig.  2(b),  the  IEEE  118  bus 
system  has  been  divided  into  5  areas.  Each  area  is  monitored  and  operated  by  a  control  center.  The  control  centers  cooperate 
to  estimate  the  state  and  to  assess  the  functionality  of  the  whole  network. 

Notice  that,  for  (11)  to  hold,  the  matrix  K i  can  be  any  basis  of  KerQffJ  Pj]T).  Hence,  let  I\\  =  [P33  Pj3  Pj3}. 
Because  every  entry  of  P13  decays  exponentially,  the  theorem  follows. 


In  Section  5.2  we  provide  an  example  to  clarify  the  exponential  decay  described  in  Theorem  4.1. 


5  An  illustrative  example 

The  effectiveness  of  the  methods  developed  in  the  previous  sections  is  now  shown  through  some  examples. 

5.1  Estimation  and  detection  for  the  IEEE  118  bus  system 

The  IEEE  118  bus  system  represents  a  portion  of  the  American  Electric  Power  System  as  of  December,  1962.  This 
test  case  system,  whose  diagram  is  reported  in  Fig.  1(b),  is  composed  of  118  buses,  186  branches,  54  generators,  and 
99  loads.  The  voltage  angles  and  the  power  injections  at  the  network  buses  are  assumed  to  be  related  through  the 
linear  relation 


PfjUS 


Pbus^b 


US  } 


where  the  matrix  Pbus  depends  upon  the  network  interconnection  structure  and  the  network  admittance  matrix. 
For  the  network  in  Fig.  1(b),  let  z  =  Pbus  —  v  be  the  measurements  vector,  where  E[i>]  =  0  and  E[twT]  =  cr2/,  cr  £  K. 
Then,  following  the  notation  in  Theorem  3.2,  the  minimum  variance  estimate  of  $bus  can  be  recovered  as 

lim  [Pbus  eal^z. 

£— >0+  L 


In  Fig.  2(a)  we  show  that,  as  e  decreases,  the  estimation  vector  computed  according  to  Theorem  3.2  converges  to 
the  minimum  variance  estimate  of  $bus- 

In  order  to  demonstrate  the  advantage  of  our  decentralized  estimation  algorithm,  we  assume  the  presence  of  5 
control  centers  in  the  network  of  Fig.  1(b),  each  one  responsible  for  a  subpart  of  the  entire  network.  The  situation  is 
depicted  in  Fig.  2(b).  Assume  that  each  control  center  measures  the  real  power  injected  at  the  buses  in  its  area,  and 
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n°  of  measurements  (xw)  ime 

(a)  (b) 


Fig.  3.  For  a  fixed  value  of  e,  Fig.  3(a)  shows  the  average  (over  100  tests)  of  the  norm  of  the  error  (with  respect  to  the  network 
state)  of  the  estimate  obtained  by  means  of  Algorithm  1.  The  estimation  error  decreases  with  the  number  of  measurements. 
Because  of  the  presence  of  several  control  centers,  our  distributed  algorithm  processes  more  measurements  (up  to  5 N)  while 
maintaining  the  same  (or  smaller)  computational  complexity  of  a  centralized  estimation  (with  N  measurements).  Fig.  3(b) 
shows  the  residual  functions  computed  by  the  5  control  centers.  Since  the  first  residual  is  greater  than  the  threshold  value,  the 
presence  of  false  data  is  correctly  detected  by  the  first  control  center.  A  form  of  regional  identification  is  possible  by  simple 
identifying  the  residuals  above  the  security  threshold. 

let  Zi  =  Pbus.i  —  u»,  with  E[u,j]  =  0  and  E^tT]  =  of/,  be  the  measurements  vector  of  the  i-tli  area.  Finally,  assume 
that  the  i-tli  control  center  knows  the  matrix  Pbus.i  such  that  Zi  =  Pbus.i^bus  +  ty.  Then,  as  discussed  in  Section  3, 
the  control  centers  can  compute  an  optimal  estimate  of  0bus  by  means  of  Algorithm  1  or  2.  Let  n,;  be  the  number  of 
measurements  of  the  i-tli  area,  and  let  N  =  Xu=i  ni-  Notice  that,  with  respect  to  a  centralized  computation  of  the 
minimum  variance  estimate  of  the  state  vector,  our  estimation  procedure  obtains  the  same  estimation  accuracy  while 
requiring  a  smaller  computation  burden  and  memory  requirement.  Indeed,  the  i-tli  monitor  uses  rii  measurements 
instead  of  N.  Let  N  be  the  maximum  number  of  measurements  that,  due  to  hardware  or  numerical  contraints, 
a  control  center  can  efficiently  handle  for  the  state  estimation  problem.  In  Fig.  3(a),  we  increase  the  number  of 
measurements  taken  by  a  control  center,  so  that  n,;  <  N,  and  we  show  how  the  accuracy  of  the  state  estimate 
increases  with  respect  to  a  single  control  center  with  N  measurements. 

To  conclude  this  section,  we  consider  a  security  application,  in  which  the  control  centers  aim  at  detecting  the  presence 
of  false  data  among  the  network  measurements  via  distributed  computation.  For  this  example,  we  assume  that  each 
control  center  mesures  the  real  power  injection  as  well  the  current  magnitude  at  some  of  the  buses  of  its  area.  By 
doing  so,  a  sufficient  redundancy  in  the  measurements  is  obtained  for  the  detection  to  be  feasible  [1] .  Suppose  that 
the  measurements  of  the  power  injection  at  the  first  bus  of  the  first  area  is  corrupted  by  a  malignant  agent.  To 
be  more  precise,  let  the  measurements  vector  of  the  first  area  be  Zi  =  Zi  +  e\U\,  where  ei  is  the  first  canonical 
vector,  and  Wi  is  a  random  variable.  For  the  simulation  we  choose  Wi  to  be  uniformly  distributed  in  the  interval 
[0,  u,’max] ,  where  u>max  corresponds  approximately  to  the  10%  of  the  nominal  real  injection  value.  In  order  to  detect 
the  presence  of  false  data  among  the  measurements,  the  control  centers  implement  Algorithm  3,  where,  being  H 
the  measurements  matrix,  and  a,  E  the  noise  standard  deviation  and  covariance  matrix,  the  threshold  value  T  is 
chosen  as  2oj|  J  —  P(PTE_1P)_1PTE_1||00.  9  The  residual  functions  \\zi~Hx  are  reported  in  Fig.  3(b).  Observe 
that,  since  the  first  residual  is  greater  than  the  threshold  T,  the  control  centers  successfully  detect  the  false  data. 
Regarding  the  identification  of  the  corrupted  measurements,  we  remark  that  a  regional  identification  may  be  possible 
by  simply  analyzing  the  residual  functions.  In  this  example,  for  instance,  since  the  residuals  {2, . . . ,  5}  are  below  the 
threshold  value,  the  corrupted  data  is  likely  to  be  among  the  measurements  of  the  first  area.  This  important  aspect 
is  left  as  the  subject  of  future  research. 

5.2  Scalability  property  of  our  finite-memory  estimation  technique 

Consider  an  electrical  network  with  ( ab )2  buses,  where  a,  b  £  N.  Let  the  buses  interconnection  structure  be  a  two 
dimensional  lattice,  and  let  G  be  the  graph  whose  vertices  are  the  (ab)2  buses,  and  whose  edges  are  the  network 
branches.  Let  G  be  partitioned  into  b2  identical  blocks  containing  a2  vertices  each,  and  assume  the  presence  of  62 
control  centers,  each  one  responsible  for  a  different  network  part.  We  assume  the  control  centers  to  be  interconnected 
through  an  undirected  graph.  In  particular,  being  V)  the  set  of  buses  assigned  to  the  control  center  C,;,  we  let  the 
control  centers  C i  and  Cj  be  connected  if  there  exists  a  network  branch  linking  a  bus  in  V)  to  a  bus  in  Vj .  An  example 

9  For  a  Gaussian  distribution  with  mean  p  and  variance  <r2,  about  95%  of  the  realizations  are  contained  in  [p  —  2<r,  p  +  2cr\. 
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Fig.  4.  In  Fig.  4(a),  a  two  dimensional  power  grid  with  400  buses.  The  network  is  operated  by  16  control  centers,  each  one 
responsible  for  a  different  subnetwork.  Control  centers  cooperate  through  the  red  communication  graph.  Fig.  4(b)  shows  the 
norm  of  the  estimation  error  of  the  local  subnetwork  as  a  function  of  the  number  of  iterations  of  Algorithm  2.  The  considered 
monitors  are  C\  ,Ce,  Cn,  and  C15.  As  predicted  by  Theorem  4.1,  the  local  estimation  error  becomes  negligible  before  the 
termination  of  the  algorithm. 

with  6  =  4  and  a  =  5  is  in  Fig.  4(a).  In  order  to  show  the  effectiveness  of  our  approximation  procedure,  suppose  that 
each  control  center  Cj  aims  at  estimating  the  vector  of  the  voltage  angles  at  the  buses  in  its  region.  We  assume  also 
that  the  control  centers  cooperate,  and  that  each  of  them  receives  the  measurements  of  the  real  power  injected  at 
only  the  buses  in  its  region.  Algorithm  2  is  implemented  by  the  control  centers  to  solve  the  estimation  problem.  In 
Fig.  4(b)  we  report  the  estimation  error  during  the  iterations  of  the  algorithm.  Notice  that,  as  predicted  by  Theorem 
4.1,  each  leader  possess  a  good  estimate  of  the  state  of  its  region  before  the  termination  of  the  algorithm. 


6  Conclusion 


Two  distributed  algorithms  for  network  control  centers  to  compute  the  minimum  variance  estimate  of  the  network 
state  given  noisy  measurements  have  been  proposed.  The  two  methods  differ  in  the  mode  of  cooperation  of  the 
control  centers:  the  first  method  implements  an  incremental  mode  of  cooperation,  while  the  second  uses  a  diffusive 
interaction.  Both  methods  converge  in  finite  time,  which  we  characterize,  and  they  require  only  local  measurements 
and  model  knowledge  to  be  implemented.  Additionally,  an  asynchronous  and  scalable  implementation  of  our  diffusive 
estimation  method  has  been  described,  and  its  efficiency  has  been  shown  through  a  rigorous  analysis  and  through 
a  practical  example.  Based  on  these  estimation  methods,  an  algorithm  to  detect  cyber-attacks  against  the  network 
measurements  has  also  been  developed,  and  its  detection  performance  has  been  characterized. 

APPENDIX 

6. 1  Proof  of  Theorem  3. 1 


PROOF.  Let  Hl  =  [Hj  ■  ■  ■  Hj]T ,  zl  =  [zj  ■  ■  ■  zJ]T .  We  show  by  induction  that  zl  =  HlXi,  K,  =  Basis(Ker(FP)), 
and  Xi  J_  Ker (Hl).  Note  that  the  statements  are  trivially  verified  for  1=1.  Suppose  that  they  are  verified  up  to  i, 
then  we  need  to  show  that  Ki+ 1  =  Basis(Ker(FTi+1)),  Xi+ 1  _L  Ker(FF+1),  and  zl+1  =  Hz+1Xi+ 

We  start  by  proving  that  Ki+ 1  =  Basis(Ker(FP+1)).  Observe  that  Ker (Kf)  =  0  for  all  i,  and  that 


Ker (Hi+1Ki)  =  {u  :  Ktv  G  Ker (Hl+1)}. 


(A-l) 


Hence, 


Im(FQ+1)  =  lm{KtKer{Hi+1lQ)  =  Im^)  O  Ker(J?i+1)  =  Ker(FP)  D  Ker{Hl+1)  =  Ker(FP+1). 
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We  now  show  that  Xi+  \  i,  Ker (Hl+1),  which  is  equivalent  to 

xi+i  =  ( Xi  +  Ki{Hl+lK^ (zi+i  -  Hi+1Xi ))  G  Ker(Hl+1)±. 

Note  that 

Ker (Hi+1)  C  Ker (iP)  Ker(Fi+1)J-  D  Ker (TP)-1. 

By  the  induction  hypothesis  we  have  Xi  G  Ker(iJ*)-L,  and  hence  Xi  G  Ker(iJ*+1)-L.  Therefore,  we  need  to  show  that 

Ki(Hi+1Ki)\zi+1  -  Hi+1Xi)  G  KeviW+Y- 

Let  w  =  (Hi+iKi)^ (zi+i  —  Hi+iXi),  and  notice  that  w  G  Ker(iL,+i  A',;)1-  due  to  the  properties  of  the  pseudoinverse 
operation.  Suppose  that  KiW  qL  Ker(7Li_)_i)-L.  Since  Ker(iv.i)  =  {0},  the  vector  w  can  be  written  as  w  =  W1+W2,  where 
KiW  1  G  Ker(iLi+i)±  and  K^w 2  =  KiW  —  KiW\  ^  0,  Klw-2  G  Ker(LL+i).  Then,  it  holds  Hi+\KiW2  =  0,  and  hence 
W2  G  Ker(Hi+iKi),  which  contradicts  the  hypothesis  w  G  Ker(iLi+i/vi)-L.  We  conclude  that  K,w  G  Ker(iLJ+i)-L  C 
Ker  (Hi+1)±. 

We  now  show  that  zt+1  =  Hl+1Xi+ 1.  Because  of  the  consistency  of  the  system  of  linear  equations,  and  because  zl  = 
HlXi  by  the  induction  hypothesis,  there  exists  a  vector  Vi  G  Ker(LP)  =  Im(K'j)  such  that  zt+1  =  Hl+1(xi  +  Vi ),  and 
hence  that  zi+ 1  =  Hi+i(xi  +  Uj).  We  conclude  that  (zi+i  —  Hi+1Xi)  G  Im(iJi+1K'j),  and  finally  that  zl+1  =  Hl+1xi+ j. 


6.2  Proof  of  Theorem  3.2 

Before  proceeding  with  the  proof  of  the  above  theorem,  we  recall  the  following  fact  in  linear  algebra. 
Lemma  6.1  Let  H  G  Rnxm.  Then  Ker ((iP)T)  =  Ker (H). 


PROOF.  We  first  show  that  Ker((LP)T)  C  Ker (H).  Recall  from  [2]  that  H  =  H HJ ( Hf)T .  Let  x  be  such  that 
(i/')Tx  =  0,  then  Hx  =  HHT(H^)Tx  =  0,  so  that  Ker((.fP)T)  C  Ker (H).  We  now  show  that  Ker(Lf)  C  Ker((LP)T). 
Recall  that  (fP)T  =  (HT)J  =  H.  Let  x  be  such  that  Hx  =  0,  then  (LP)t x  =  (HHJ)^  Hx  =  0,  so  that 

Ker(if)  C  Ker((LP)T),  which  concludes  the  proof. 


We  are  now  ready  to  prove  Theorem  3.2. 


PROOF.  The  first  property  follows  directly  from  [2]  (cfr.  page  427).  To  show  the  second  property,  observe  that 
C+  =  so  that 


lim  eD  =  0. 

£—>0+ 

For  the  theorem  to  hold,  we  need  to  verify  that 

-  HW  )By  =  (iLTE-1£f)-1iLTE-1, 

or,  equivalently,  that 

(W  -  H^B((I  -  iLiLt)R)t)  HH f  (A-2) 

and 

(iP  -  H^B((I  -  £Tfft)R)t)  (/  -  HH f)  =  (tfTE-1ff)-1JPrE-1(J  -  HH t).  (A-3) 
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Consider  equation  (A-2).  After  simple  manipulation,  we  have 


77'  -  H*B((I  -  7777t).B)t7777t  =  H\ 

so  that  we  need  to  show  only  that 

77^(7  -  =  0. 

Recall  that  for  a  matrix  W  it  holds  =  (WTW)^WT .  Then  the  term  ((/  —  7777^)77)^777/^  equals 

(((/  -  7777t).B)T((7  -  7777t).B))t  Bt(I  -  7777t)7777t  =  0, 

because  (7  —  HH^)HH^  =  0.  We  conclude  that  equation  (A-2)  holds.  Consider  now  equation  (A-3).  Observe  that 
7777' (7  —  7777' )  =  0.  Because  B  has  full  row  rank,  and  £  =  BBT ,  simple  manipulation  yields 

-  Ht(BBt)~1HH"iB  [(/  -  HH*)B] 1  (7  -  HH*)B  =  HT(BBT)~1(I  -  HH*)B, 

and  hence 

HT(BBT)~1  |/  +  7777*5  [(/  -  HH^B] 1 }  (7  -  HH^B  =  0. 

Since  7777*  =  I  —  (7  —  7777*),  we  obtain 

HT(BBT)-iB  [(7  -  7777*)5] f  (7  -  7777*)5  =  0. 

A  sufficient  condition  for  the  above  equation  to  be  true  is 

( [(7  -  hh^)b]  t  BT(BBT)~1H  =  0. 

From  Lemma  6.1  we  have. 

Ker  ^[(7-  AA*)#]1)^  =  Ker((7  —  AA*)77). 

Since 

(7  -  HH^)BBt(BBt)~1H  =  (7  -  7777t)77  =  0, 

we  have  that 

Ht(BBt)-:B  [(I  -  HH^B] f  (7  -  7777*)5  =  0, 
and  that  equation  (A-3)  holds.  This  concludes  the  proof. 
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